[FFL-2449] Add server-side flag evaluation metrics documentation#37257
[FFL-2449] Add server-side flag evaluation metrics documentation#37257vjfridge wants to merge 12 commits into
Conversation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… metrics guide Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rver SDK pages Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7b42c572b5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| @@ -0,0 +1,146 @@ | |||
| --- | |||
| title: Set Up Server-Side Flag Evaluation Metrics | |||
There was a problem hiding this comment.
| @@ -0,0 +1,72 @@ | |||
| --- | |||
| title: Feature Flag Graphs | |||
There was a problem hiding this comment.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
| @@ -8,6 +8,9 @@ further_reading: | |||
| - link: "/remote_configuration/" | |||
| tag: "Documentation" | |||
There was a problem hiding this comment.
…ng links Remove DD_METRICS_OTEL_ENABLED from all server SDK pages and replace with comments pointing to the setup guide, matching the _index.md pattern. Add further_reading links to server_flag_evaluation_metrics and flag_graphs on all SDK pages. Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
| @@ -8,6 +8,12 @@ further_reading: | |||
| - link: "/tracing/trace_collection/dd_libraries/dotnet-core/" | |||
| tag: "Documentation" | |||
| text: ".NET Tracing" | |||
There was a problem hiding this comment.
| @@ -8,6 +8,12 @@ further_reading: | |||
| - link: "/tracing/trace_collection/dd_libraries/go/" | |||
| tag: "Documentation" | |||
There was a problem hiding this comment.
| @@ -8,10 +8,18 @@ further_reading: | |||
| - link: "/tracing/trace_collection/automatic_instrumentation/dd_libraries/java/" | |||
| tag: "Documentation" | |||
| text: "Java APM and Distributed Tracing" | |||
There was a problem hiding this comment.
| @@ -11,6 +11,12 @@ further_reading: | |||
| - link: "/tracing/" | |||
| tag: "Documentation" | |||
| text: "Learn about Application Performance Monitoring (APM)" | |||
There was a problem hiding this comment.
| @@ -8,6 +8,12 @@ further_reading: | |||
| - link: "/tracing/trace_collection/dd_libraries/python/" | |||
| tag: "Documentation" | |||
| text: "Python Tracing" | |||
There was a problem hiding this comment.
| @@ -11,6 +11,12 @@ further_reading: | |||
| - link: "/tracing/" | |||
| tag: "Documentation" | |||
| text: "Learn about Application Performance Monitoring (APM)" | |||
There was a problem hiding this comment.
sameerank
left a comment
There was a problem hiding this comment.
Thanks for taking this on! I did some cross-referencing and verifying to tighten up the information. Feel free to let me know if anything is unclear
| {{< code-block lang="bash" >}} | ||
| # gRPC endpoint (port 4317) | ||
| DD_OTLP_CONFIG_RECEIVER_PROTOCOLS_GRPC_ENDPOINT=0.0.0.0:4317 | ||
|
|
||
| # HTTP endpoint (port 4318) | ||
| DD_OTLP_CONFIG_RECEIVER_PROTOCOLS_HTTP_ENDPOINT=0.0.0.0:4318 | ||
| {{< /code-block >}} |
There was a problem hiding this comment.
A lot of this is already covered in otlp_ingest_in_the_agent.md
So I might be better to just link to that doc which I assume is the canonical one instead of duplicating it here
|
|
||
| You only need to enable the protocol your application uses. Both gRPC and HTTP are shown for reference. | ||
|
|
||
| <div class="alert alert-info">If you are running Agent v7.61.0 or later in Docker, set <code>HOST_PROC=/proc</code> on the Agent container to work around a known issue with the OTLP pipeline.</div> |
There was a problem hiding this comment.
Also covered here
| OTEL_EXPORTER_OTLP_METRICS_ENDPOINT=http://<AGENT_HOST>:4318/v1/metrics | ||
|
|
||
| # Or use gRPC (no path suffix): | ||
| # OTEL_EXPORTER_OTLP_METRICS_ENDPOINT=http://<AGENT_HOST>:4317 |
There was a problem hiding this comment.
OTEL_EXPORTER_OTLP_METRICS_ENDPOINT is valid, but the canonical docs seem to prefer OTEL_EXPORTER_OTLP_ENDPOINT, which I believe accomplishes the same
documentation/content/en/opentelemetry/setup/otlp_ingest_in_the_agent.md
Lines 328 to 338 in ba55c9a
I'm actually not sure which one to use. OTEL_EXPORTER_OTLP_METRICS_ENDPOINT is more granular and only applies to OTLP for metrics, while OTEL_EXPORTER_OTLP_ENDPOINT covers all OTLP
There was a problem hiding this comment.
Also the default OTLP protocol is http/protobuf; pointing at :4317 without OTEL_EXPORTER_OTLP_PROTOCOL=grpc sends HTTP to the gRPC port and fails. So we also need to mention the protocol var for the gRPC option
| ## Step 2: Configure your application | ||
|
|
||
| Set the following environment variables on your application in addition to the standard [server-side feature flag configuration][1]: | ||
|
|
||
| {{< code-block lang="bash" >}} | ||
| # Enable flag evaluation metrics | ||
| DD_METRICS_OTEL_ENABLED=true | ||
|
|
||
| # Point OTLP metrics at the Datadog Agent | ||
| # HTTP endpoint (note the /v1/metrics path suffix): | ||
| OTEL_EXPORTER_OTLP_METRICS_ENDPOINT=http://<AGENT_HOST>:4318/v1/metrics | ||
|
|
||
| # Or use gRPC (no path suffix): | ||
| # OTEL_EXPORTER_OTLP_METRICS_ENDPOINT=http://<AGENT_HOST>:4317 | ||
| {{< /code-block >}} | ||
|
|
||
| Replace `<AGENT_HOST>` with the hostname or IP address of your Datadog Agent. In a Docker Compose setup, this is typically the Agent container's service name. |
There was a problem hiding this comment.
In most cases, the endpoints don't need to be set. These variables were used in ffe-dogfooding mainly to control if the metrics were being sent to the agent vs. a special container for reporting the counts in http://localhost:8080/dashboard
| ## Step 2: Configure your application | |
| Set the following environment variables on your application in addition to the standard [server-side feature flag configuration][1]: | |
| {{< code-block lang="bash" >}} | |
| # Enable flag evaluation metrics | |
| DD_METRICS_OTEL_ENABLED=true | |
| # Point OTLP metrics at the Datadog Agent | |
| # HTTP endpoint (note the /v1/metrics path suffix): | |
| OTEL_EXPORTER_OTLP_METRICS_ENDPOINT=http://<AGENT_HOST>:4318/v1/metrics | |
| # Or use gRPC (no path suffix): | |
| # OTEL_EXPORTER_OTLP_METRICS_ENDPOINT=http://<AGENT_HOST>:4317 | |
| {{< /code-block >}} | |
| Replace `<AGENT_HOST>` with the hostname or IP address of your Datadog Agent. In a Docker Compose setup, this is typically the Agent container's service name. | |
| ## Step 2: Configure your application | |
| Set the following environment variable on your application, in addition to the | |
| standard [server-side feature flag configuration][1]: | |
| {{< code-block lang="bash" >}} | |
| # Enable flag evaluation metrics | |
| DD_METRICS_OTEL_ENABLED=true | |
| {{< /code-block >}} | |
| By default, most tracers send OTLP metrics to the Agent at `DD_AGENT_HOST` on port | |
| `4318`. If your application already sets `DD_AGENT_HOST` to reach the Agent, no | |
| endpoint configuration is required. | |
| Set an OTLP endpoint explicitly in either of these cases: | |
| - The Agent is not reachable at `DD_AGENT_HOST` on the default OTLP port (for example, | |
| a remote Agent or a non-default port). | |
| - You use the **Java** tracer. The Java tracer does not derive the endpoint from | |
| `DD_AGENT_HOST`; it defaults to `localhost:4318`. Set the endpoint whenever the | |
| Agent is not on `localhost`. | |
| To set the endpoint, use the standard OpenTelemetry variable: | |
| {{< code-block lang="bash" >}} | |
| # Point OTLP data at the Datadog Agent (HTTP, port 4318) | |
| OTEL_EXPORTER_OTLP_ENDPOINT=http://<AGENT_HOST>:4318 | |
| # Or use gRPC (port 4317). The default protocol is http/protobuf, so you must also | |
| # set the protocol to grpc when using the gRPC port: | |
| # OTEL_EXPORTER_OTLP_ENDPOINT=http://<AGENT_HOST>:4317 | |
| # OTEL_EXPORTER_OTLP_PROTOCOL=grpc | |
| {{< /code-block >}} | |
| Replace `<AGENT_HOST>` with the hostname or IP address of your Datadog Agent. In a | |
| Docker Compose setup, this is typically the Agent container's service name. To set the | |
| metrics endpoint independently of other OTLP signals, use | |
| `OTEL_EXPORTER_OTLP_METRICS_ENDPOINT` instead, and append the `/v1/metrics` path for HTTP. | |
| Before setting up flag evaluation metrics: | ||
|
|
||
| - Server-side feature flags are already configured. See [Server-Side Feature Flags][1]. | ||
| - Datadog Agent 7.55 or later is running. |
There was a problem hiding this comment.
The docs say
Since versions 6.32.0 and 7.32.0
I don't see this number in ffe-dogfooding either, so this might be over-restrictive unless there's some other source for this
| | Language | Minimum tracer version | | ||
| | -------- | ---------------------- | | ||
| | .NET | 3.44.0 | | ||
| | Go | 2.8.0 | | ||
| | Java | 1.62.0 | | ||
| | Node.js | 5.99.0 | | ||
| | Python | 4.7.0 | | ||
| | Ruby | 2.32.0 | |
There was a problem hiding this comment.
We can add PHP soon DataDog/dd-trace-php#3911
And according to DataDog/system-tests#7033 we expect it in the not-yet-released v1.21.1
| 1. Go to [Metrics Explorer][2] and search for `feature_flag.evaluations`. | ||
| 2. If the metric does not appear within a few minutes of your application evaluating flags, check: | ||
| - The Agent OTLP receiver is enabled and the correct port is exposed. | ||
| - `OTEL_EXPORTER_OTLP_METRICS_ENDPOINT` points to the Agent, not a separate collector. |
There was a problem hiding this comment.
Might also update this depending on what you update in step 2 re: OTEL_EXPORTER_OTLP_ENDPOINT vs OTEL_EXPORTER_OTLP_METRICS_ENDPOINT
| - DD_OTLP_CONFIG_RECEIVER_PROTOCOLS_GRPC_ENDPOINT=0.0.0.0:4317 | ||
| - DD_OTLP_CONFIG_RECEIVER_PROTOCOLS_HTTP_ENDPOINT=0.0.0.0:4318 | ||
| - HOST_PROC=/proc # Required for Agent v7.61.0+ running in Docker |
There was a problem hiding this comment.
You only need one or the other, but this reads like both are required. We needed both in ffe-dogfooding because python defaults to gRPC and the rest of the SDKs use HTTP
"# Required for Agent v7.61.0+ running in Docker" is a bit of an overstatement because it's one of a few workarounds for a known issue. I'd rephrase to "If running Agent v7.61.0+ in Docker"
There was a problem hiding this comment.
And actually that distinction between python/gRPC vs the rest is probably worth mentioning somewhere
Maybe an additional bullet in the Step 2 rewrite?
- The **Python** tracer defaults to the gRPC protocol (Agent OTLP port `4317`), whereas the other tracers default to HTTP (port `4318`). Make sure the Agent receiver port you enabled in Step 1 matches, or set `OTEL_EXPORTER_OTLP_PROTOCOL` and the endpoint explicitly.
|
|
||
|
|
There was a problem hiding this comment.
Assuming 2 blank lines was unintentional
| | `feature_flag.key` | The flag key being evaluated | | ||
| | `feature_flag.result.variant` | The variant returned by the evaluation | | ||
| | `feature_flag.result.reason` | The reason for the evaluation result | | ||
| | `feature_flag.result.allocation_key` | The targeting rule id | |
There was a problem hiding this comment.
All six SDKs also attach error.type on error evaluations. This is an OTel standard key https://opentelemetry.io/docs/specs/semconv/registry/attributes/error/ Also worth noting that allocation_key is emitted only when present, i.e. it's conditional.
Also technically "targeting rule id" is not the right description for an allocation_key because it's a 1-to-many relationship. An allocation can contain multiple targeting rules. I don't think we officially have a definition anywhere and I don't think this table is the right place to get into it in any meaningful depth .. so maybe we can go with "The identifier for the evaluated allocation"?
What does this PR do? What is the motivation?
Fixes FFL-2449
Adds public documentation for setting up server-side flag evaluation metrics, which were previously undocumented beyond a one-liner env var mention. The setup requires enabling the Datadog Agent OTLP receiver and pointing the application at it — neither of which existed in any public docs.
Changes
feature_flags/guide/server_flag_evaluation_metrics— step-by-step setup for Agent OTLP receiver, application env vars, metric verification, Historical Metrics retention, and a dashboard query reference. Includes a minimum tracer version table per SDK and marksfeature_flag.evaluationsas experimental.feature_flags/concepts/flag_graphs— describes the graphs on the flags list page and flag details page (targeting rule distribution, server evaluations, client evaluations, errors/latency, export to dashboard) for both client and server SDKs.feature_flags/server/_index.md— existingDD_METRICS_OTEL_ENABLEDalert updated to note the metric is experimental and link to the new guide.getting_started/feature_flags/_index.md— Step 5 now references the new metrics guide for server-side apps.feature_flags/guide/_index.mdandfeature_flags/concepts/_index.md— new pages added to navigation indexes.dotnet,go,java,nodejs,python,ruby) — added experimental warning alert next to theDD_METRICS_OTEL_ENABLEDenv var.Merge instructions
Merge readiness: